-
Notifications
You must be signed in to change notification settings - Fork 348
Torchair graph812 cov #2337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Torchair graph812 cov #2337
Conversation
Signed-off-by: p00465316 <[email protected]>
Signed-off-by: p00465316 <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
Signed-off-by: taoyuxiang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for Qwen models with torchair, including new model files, tests, and configuration changes. My review focuses on several critical issues. I've identified a typo in a test patch target that would cause it to fail. More importantly, I found two critical bugs in the rotary_embedding
implementation: one causing a tensor shape mismatch due to incorrect unsqueeze
operations, and another related to an incorrect monkey-patch target that would prevent the new rotary embedding logic from being used. These issues need to be addressed to ensure the correctness of the new functionality.
def stubbed_get_state(ep_size, with_prefill, is_deepseek_v3_r1): | ||
return _get_fused_moe_state(16, with_prefill, is_deepseek_v3_r1) | ||
|
||
with patch('mockif._get_fused_moe_state', side_effect=stubbed_get_state): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The target string for patch
seems incorrect. 'mockif._get_fused_moe_state'
will likely cause a ModuleNotFoundError
because mockif
is not a valid module. Based on the imports, you probably intended to patch _get_fused_moe_state
from vllm_ascend.ascend_forward_context
.
with patch('mockif._get_fused_moe_state', side_effect=stubbed_get_state): | |
with patch('vllm_ascend.ascend_forward_context._get_fused_moe_state', side_effect=stubbed_get_state): |
cos = cos.unsqueeze(-2).unsqueeze(-2) | ||
sin = sin.unsqueeze(-2).unsqueeze(-2) | ||
|
||
query = query.unsqueeze(1) | ||
key = key.unsqueeze(1) | ||
|
||
q_embed, k_embed = torch_npu.npu_apply_rotary_pos_emb(query, key, cos, sin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unsqueeze(1)
calls on query
and key
introduce an extra dimension, which leads to an incorrect output shape from rope_forward
. The attention mechanism expects a tensor of shape (num_tokens, num_heads * head_size)
, but this implementation will produce a tensor with an extra dimension of size 1, e.g., (num_tokens, 1, num_heads * head_size)
. This will cause a shape mismatch error in the subsequent attention computation. The unsqueeze
calls on cos
and sin
should also be adjusted to a single unsqueeze
to ensure correct broadcasting.
cos = cos.unsqueeze(-2).unsqueeze(-2) | |
sin = sin.unsqueeze(-2).unsqueeze(-2) | |
query = query.unsqueeze(1) | |
key = key.unsqueeze(1) | |
q_embed, k_embed = torch_npu.npu_apply_rotary_pos_emb(query, key, cos, sin) | |
cos = cos.unsqueeze(-2) | |
sin = sin.unsqueeze(-2) | |
q_embed, k_embed = torch_npu.npu_apply_rotary_pos_emb(query, key, cos, sin) |
@@ -283,7 +378,8 @@ def deepseek_rope_init_func( | |||
device="npu") | |||
|
|||
|
|||
RotaryEmbedding.forward_oot = rope_forward_oot | |||
RotaryEmbedding.__init__ = qwen_rope_init_func | |||
RotaryEmbedding.forward_oot = rope_forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are monkey-patching RotaryEmbedding.forward_oot
, but RotaryEmbedding
instances are called via their forward
method. To correctly override the rotary embedding logic, you should patch RotaryEmbedding.forward
. Using forward_oot
will not have the desired effect as it's not the standard method called.
RotaryEmbedding.forward_oot = rope_forward | |
RotaryEmbedding.forward = rope_forward |
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
d2462c9
to
0739169
Compare
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (77.67%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2337 +/- ##
==========================================
- Coverage 76.09% 75.77% -0.32%
==========================================
Files 114 118 +4
Lines 13103 13731 +628
==========================================
+ Hits 9971 10405 +434
- Misses 3132 3326 +194
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
59a4e7c
to
f759f5e
Compare
2. add rope_forward ut Signed-off-by: taoyuxiang <[email protected]>
bd35c24
to
72cdec2
Compare
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?